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Abstract. We prove several theorems to give sufficient conditions for convergence 



of quantum annealing, which is a protocol to solve generic optimization problems by 
quantum dynamics. In particular the property of strong ergodicity is proved for the 
path-integral Monte Carlo implementation of quantum annealing for the transverse 
Ising model under a power decay of the transverse field. This result is to be compared 
with the much slower inverse-log decay of temperature in the conventional simulated 
^ \ annealing. Similar results are proved for the Green's function Monte Carlo approach, 

^t" 1 Optimization problems in continuous space of particle configurations are also discussed. 



1. Introduction 



One of the central problems in computer science is to develop efficient algorithms for 
hard optimization problems pQ. A standard approach is to propose a new algorithm for 
a given specific problem by improving existing methods or by devising new approaches. 
Simulated annealing (SA) presents a different perspective, which provides a generic 
algorithm to be applicable in principle to an arbitrary problem The basic idea is to 

numerically simulate a physical annealing process by the introduction of a temperature 
variable under the identification of the cost function to be minimized with the energy of 
the system. One decreases the temperature from a very high initial value toward zero 
as the simulation time proceeds with the hope to reach the optimal state (ground state) 
at the end of the process. 

The efficiency of SA is determined by the choice of the annealing schedule, the 
rate of temperature decrease. A very slow decrease would certainly lead the system 
to the ground state because the system stays close to equilibrium at each temperature. 
However, such a slow process is not very useful practically. On the other hand, when the 
temperature is decreased too quickly, the system may be trapped in a local minimum. It 
is therefore important to establish criteria on how fast one can decrease the temperature 
to reach the optimal state avoiding local minima. 

A theorem by Geman and Geman [3] gives a generic answer to this problem. Any 
system is guaranteed to converge to the optimal state in the limit of infinite time if the 
temperature is decreased in proportion to N/ logt or slower, where N is the system size 
and t denotes simulation steps. This result is highly non-trivial since the system reaches 
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the equilibrium state (ground state) after a long non-equilibrium process in which the 
temperature changes with time at a finite, non-vanishing, rate. 

Quantum annealing (QA) is a relatively new alternative to SA, which uses quantum 
fluctuations, instead of thermal effects, to search the phase space of the system for the 
optimal state El HI El EE QUI- An artificial term of kinetic energy of quantum nature 
is introduced, by which the system moves around in the phase space. The cost function 
is regarded as the potential energy. A slow decrease of the kinetic energy is expected 
to bring the system towards the optimal state. A related method of quantum adiabatic 
evolution [TT] is based on essentially the same idea. 

A remarkable fact is that QA has been found to be more effective in solving 
optimization problems than SA in most cases numerically investigated so far, including 
the ground state search of random spin systems O HBJ EH EE] , protein folding [Hj , the 
configuration of molecules in a Lennard- Jones cluster travelling salesman problem 
[T5] , simple potentials |2U] and a kinetically constrained system [2T] . It has also been 
observed experimentally that a QA-like process leads to equilibrium more efficiently 
than a thermal process [221 ■ In contrast, in the instance of 3-SAT, a hard optimization 
problem, QA has been found not to outperform SA It is therefore a very interesting 
problem to establish when and how QA converges to the ground state, preferably with 
a comparison with SA in mind. 

In the present paper we report on a solution of this problem by proving several 
theorems which give sufficient conditions for convergence of QA. In many numerical 
studies of QA, stochastic processes are used in the forms of path-integral and Green's 
function Monte Carlo simulations mainly due to difficulties in directly solving the 
Schrodinger equation for large systems. Our approach reflects such developments, and 
we derive convergence conditions for Monte Carlo implementations of QA using the idea 
of Geman and Geman for convergence conditions for SA. 

This paper consists of five sections. Various definitions of an inhomogeneous Markov 
chain are given in the next section. Convergence of QA is proved for the path-integral 
and the Green function Monte Carlos in section 3 and section 4, respectively. The last 
section is devoted to discussions. 



2. Ergodicity of inhomogeneous Markov chain 

Since we use the theory of stochastic processes, it is useful to recall various definitions 
and theorems for inhomogeneous Markov processes j3] . We denote the space of discrete 
states by S and assume that the size of S is finite. A Monte Carlo step is characterized 
by the transition probability from state x(g S) to state y(e S) at time step t: 

f P(y,x)A(y,x;t) (x ^ y) 

G(y,x;t) = I i-J2P{z,x)A{z,x;t) {x = y) > t 2 ' 1 ) 

where P(y, x) and A(y, x\ t) are called the generation probability and the acceptance 
probability, respectively. The former is the probability to generate the next candidate 
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state y from the present state x. We assume that this probability does not depend on 
time and satisfies the following conditions: 

Vx,yeS:P(y,x)=P(x,y)>0, (2.2) 
Vx G S : P(x, x) = 0, (2.3) 

VxeS: y £P(y,x) = l, (2.4) 

yes 

71-1 

Vx, y G 5, 3n > 0, 3zi, • • • , z n _i G 5 : Y[ P{zk+l, z k) > 0, z = x, z n = y. (2.5) 

k=0 

The last condition represents irreducibility of S, that is, any state in S can be reached 
from any other state in S. 

We define S x as the neighbourhood of x, i.e., the set of states that can be reached 
by a single step from x: 

S x = {y\yeS,P(y,x)>0}. (2.6) 

The acceptance probability A(y, x; t) is the probability to accept the candidate y 
generated from state x. The matrix G(t), whose (y,x) component is given by (J2.ll) . 
[G(t)]y tX = G(y,x;t), is called the transition matrix. 

Let V denote the set of probability distributions on S. We regard a probability 
distribution p(G V) as the column vector with the component [p] x = p(x). The 
probability distribution at time t, started from an initial distribution po(& V) at time 
to, is written as 

p(t, t ) = G^Vo = G(t - l)G(t - 2) • • • G?(* )po. (2.7) 

A Markov chain is called inhomogeneous when the transition probability depends on 
time. In sections 3 and 4, we will prove that inhomogeneous Markov chains associated 
with QA are ergodic under appropriate conditions. There are two kinds of ergodicity, 
weak and strong. Weak ergodicity means that the probability distribution becomes 
independent of initial conditions after a sufficiently long time: 

Vt > : lim sup{ \\p(t, t ) - p'(t, t )\\\ p , p' G V} = 0, (2.8) 

t — >oo 

where p(t,t ) and p'(t,t Q ) are the probability distributions with different initial 
distributions po and p' . The norm is defined by 

||p|| = £b(x)l- (2.9) 

Strong ergodicity is the property that the probability distribution converges to a unique 
distribution irrespective of initial state: 

3r G V,Vt > : lim sup{||p(t, t Q ) - rll | po G V} = 0. (2.10) 

t — >oo 

The following two theorems provide conditions for weak and strong ergodicity of 
an inhomogeneous Markov chain 
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Theorem 1 (Condition for weak ergodicity) An inhomogeneous Markov chain is 
weakly ergodic if and only if there exists a strictly increasing sequence of positive numbers 
{ti}, (i = 0, 1, 2, . . .), such that 

oo 

£ (l - a(G u+1 > u )) — ► oo, (2.11) 

i=0 

where a(G ti+1,ti ) is the coefficient of ergodicity defined by 

a(G u+1 > u ) = l-mml^2mm{G(z,x),G(z,y)}\x,y E s\ (2.12) 
with the notation G(z,x) = [G u+1,ti } z ^ x . 

Theorem 2 (Condition for strong ergodicity) An inhomogeneous Markov chain 
is strongly ergodic if the following three conditions hold: 

(i) the Markov chain is weakly ergodic, 

(ii) for all t there exists a stationary state pt G P such that pt = G(t)p t , 
(Hi) p t satisfies 

oo 

£ \\Pt -Pt+i\\ < oo. (2.13) 
t=o 

Moreover, if p = lim p t , then p is equal to the probability distribution r in A2.1U\) . 

3. Quantum annealing with path-integral Monte Carlo method 

3.1. Path-integral Monte Carlo method 

Let us first discuss convergence conditions for the implementation of quantum annealing 
by the path-integral Monte Carlo (PIMC) method [24J. The basic idea of PIMC is 
to apply the Monte Carlo method to the classical system obtained from the original 
quantum system by the path-integral formula. It is instructive to first consider the 
example of ground state search of the Ising spin system as a typical combinatorial 
optimization problem. The Ising system with generic interactions as discussed below 
covers a wide range of problems in combinatorial optimization. Examples include the 
ground-state search of spin glasses, travelling salesman problem, neural networks and 
the satisfiability problem, many of which have been treated in the Ising expression in 
the literature mentioned in Introduction. 

Quantum fluctuations are introduced by adding a transverse field to the usual Ising 
spin system. The Hamiltonian of the transverse-field Ising model (TFIM) thus obtained 
is written as 

N 

tf(*) = -£-V?*?-r(f)£*f, (3-i) 

(ij) i=0 

where the af (a = x, y, z) are the Pauli matrices, components of the spin | operator at 
site i, and J^- denotes the coupling constant between sites i and j. There is no restriction 
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in the spatial dimensionality and the lattice structure. It is also to be noted that the 
existence of arbitrary many-body interactions between z components of Pauli matrix 
and longitudinal random magnetic field hicr*, in addition to the above Hamiltonian, 
would not change the following argument. 

The first term of the right-hand side of (j3.1j) is the cost function (or potential) 
to be minimized. The transverse field T(t) represents the strength of kinetic energy of 
quantum nature, which induces spin flips between up and down states measured in the 
z direction. In the QA, T(t) is gradually reduced from a very large (or infinitely large) 
initial value to zero as time proceeds. By starting from the trivial ground state of the 
initial system composed only of the transverse-field term — F(t) Y^i°~i an d following the 
time development of the system under a slow decrease of the transverse field, one hopes 
to eventually reach the non-trivial ground state of the original problem, — Jij°~i°~j, 
when T(t) vanishes. An important problem is how slow is sufficiently slow to achieve 
this goal. 

In the path-integral method, the rf-dimensional TFIM is mapped to a (d + 1)- 
dimensional classical Ising system so that the quantum system can be simulated on 
classical computer. In numerical simulations, the Suzuki- Trotter formula j2EJ EH] is 
usually employed to express the partition function of the resulting classical system, 

(n M M N \ 

j7 E E J^s™ + 7 (t) E E ^ (fc+1) , (3.2) 
m k=l (ij) k=l i=0 ) 

where M is the length along the extra dimension (Trotter number) and S^ k \= ±1) 
denotes a classical Ising spin at site i on the kth Trotter slice. The nearest-neighbour 
interaction between adjacent Trotter slices, 

7(*) = ^log(coth^V (3.3) 

is ferromagnetic. This approximation ()3.2|) becomes exact in the limit M — > oo for a 
fixed (3 = 1/ksT. The magnitude of this interaction ()3.3|) increases with time t and tends 
to infinity as t — > oo, reflecting the decrease of T(t). We fix M and f3 to arbitrary large 
values, which corresponds to the actual situation in numerical simulations. Therefore 
the theorem presented below does not directly guarantee the convergence of the system 
to the true ground state, which is realized only after taking the limits M — > oo and 
(3 — > oo. We will rather show that the system converges to the thermal equilibrium 
represented by the right-hand side of ()3.2|) . which can be chosen arbitrarily close to the 
true ground state by taking M and (3 large enough. 

With the above example of TFIM in mind, it will be convenient to treat a more 
general expression than ()3.2|) . 

Here F (x) is the cost function whose global minimum is the desired solution of the 
combinatorial optimization problem. The temperature T is chosen to be sufficiently 
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small. The term Fi(x) derives from the kinetic energy, which is the transverse field in 
the TFIM. Quantum fluctuations are tuned by the extra temperature factor Ti(t), which 
decreases with time. The first term — Fq{x)/Tq corresponds to the interaction term in 
the exponent of (|3.2|) . and the second term — F 1 (x)/T 1 (t) generalizes the transverse- field 
term in (|3.2)l . 

For the partition function ([3.4)1 . we define the acceptance probability of PIMC as 
A{y ' X]t)=9 {^§l))' (3 ' 5) 

q{x > t] = w) P \^~W))- (3 - 6) 

This q(x;t) is the equilibrium Boltzmann factor at a given fixed Ti(t). The function 
g(u) is the acceptance function, a monotone increasing function satisfying < g(u) < 1 
and g(l/u) = g(u)/u for u > 0. For instance, for the heat bath and the Metropolis 
methods, we have 

u 

9(u) = t— , (3.7) 
1 + u 

g(u) = min{l,w}, (3.8) 

respectively. The conditions mentioned above for guarantee that q(x; t) is the 
stationary distribution of the homogeneous Markov chain defined by the transition 
matrix G(t) with a fixed t. In other words, q(x; t) is the right eigenvector of G(t) 
with eigenvalue 1. 

3.2. Convergence theorem for QA-PIMC 

We first define a few quantities. The set of local maximum states of F\ is written as S m , 

S m = {x\xeS, Wye S x , F x {y) < F 1 (x)} . (3.9) 

We denote by d(y, x) the minimum number of steps necessary to make a transition from 
x to y. Using this notation we define the minimum number of maximum steps needed 
to reach any other state from an arbitrary state in the set S \ S m , 

R = minjmax {d(y, x) \ y G S} xES\S m \. (3.10) 

Also, L and L\ stand for the maximum changes of Fq(x) and Fi(x), respectively, in a 
single step, 

L = max{|F (a;)-Fo(y)| \P(y,x)>0, x,yeS}, (3.11) 
L 1 = max{\F 1 {x)-F 1 (y)\ \P(y,x)>0, x,yeS}. (3.12) 
Our main results are summarized in the following theorem and its corollary. 

Theorem 3 (Strong ergodicity of the system (|3.4|) ^ The inhomogeneous Markov 
chain generated by \3. <5j) and \3. 6)) is strongly ergodic and converges to the equilibrium 
state corresponding to the first term of the right-hand side of i3.6}) . exp(— F (x)/T ), if 

T ' (i) £ v&U- (3 ' 13) 
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Application of this theorem to the PIMC implementation of QA represented by (|3.2j) 
immediately yields the following corollary. 

Corollary 1 (Strong ergodicity of QA-PIMC for TFIM) The inhomogeneous Markov 
chain generated by the Boltzmann factor on the right-hand side of \S. ty) is strongly er- 
godic and converges to the equilibrium state corresponding to the first term on the right- 
hand side of \3. j| ) if 

r (*)>f tanh_1 (^7^- (3 ' 14) 

Remark. For sufficiently large t, the above inequality reduces to 

r(t) > ^-{t + 2)- 2 / RL \ (3.15) 

P 

This result implies that a power decay of the transverse field is sufficient to guarantee 
the convergence of quantum annealing of TFIM by the PIMC. 

To prove strong ergodicity it is necessary to prove weak ergodicity first. The 
following lemma is useful for this purpose. The proof of this lemma is given in 
| Appendix A| 

Lemma 1 (Lower bound on the transition probability) The elements of the 
transition matrix defined by \2. hS. ,5|) and IIS. 6]) have the following lower bound: 

P(y, x) > => Vt > : G(y, x;t)>w g{\) exp f - A- J , (3.16) 

and 

T m) 



3*1 > 0,Vx E S\S m ,Vt > t x : G(a;,x;t) > w^(l)exp (-^ - ] . (3.17) 



Proof of weak ergodicity implied in Theorem^ Let us introduce the following quantity 

x* = argminjmax {d(y, x) \ y G S} xG<S\5 m j. (3.18) 

Comparison with the definition of R in f)3.10j) implies that the state x* is reachable by 
at most R transitions from any states. Also, w stands for the minimum non-vanishing 
value of P(y, x), 

w = min {P(y, x) \ P(y, x) > 0, x, y G S} . (3.19) 

Now, consider the transition probability from an arbitrary state a; to a;*. From the 
definitions of R and x*, there exists at least one transition route within R steps: 

x = x ^ Xi ^ x 2 ^ ■ ■ ■ ^ xi = x i+1 = ■ ■ ■ = x R = x* . 

Then Lemma ^ yields that, for sufficiently large t, the transition probability at each 
time step has the following lower bound: 

G(x i+1 , Xi ; t-R + i)> wg(l) exp (-^ - ^ j^Ti) ) ' (3 ' 20) 
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Thus, by taking the product of (|3.2Uj) from % = to % = R — 1, we have 
G t ' t ~ R (x* ) x) > G(x*, x R -x\ t - l)G(x R -x,x R -2] t - 2) • • • G(x t ,x; t - R) 
> II wg(l)exp (-^- - L> 1 



i=0 



T Tx(t-R + i) 



> ^ 9(1) « exp (-^ - ^) , (3,!, 

where we have used monotonicity of Tx(t). Consequently, it is possible to find an integer 
fco > such that, for all k > k , the coefficient of ergodicity satisfies 

1 - c( G — )) > «W«P (-^ - jT ^) . (3.22) 

We now substitute the annealing schedule (J3.13|) . Then weak ergodicity is immediately 
proved from Theorem ^ because we obtain 

CO , D T \ OO 1 

E(1 _ a{Gk R, k n- R)) > ^ (l) H exp (_^M ^ ^_ _ oc . (3.23) 
fc=l v J o / k=ko ^ti + 1 

Proof of Theorem^ To prove strong ergodicity, we refer to Theorem |21 The condition 
(i) has already been proved. As has been mentioned, the Boltzmann factor (j3.6J) satisfies 
q(t) = G(t)q(t), which is the condition (ii). Thus the proof will be complete if we prove 
the condition (iii) by setting p t = q{t). As shown in |Appendix B[ q(x; t) is monotonically 
increasing for large t: 

Vt > 0,Vx G Sf D : q(x;t + 1) > q{x;t), (3.24) 

3tx > 0,Vt > ti,Vx G5 \ : q(x;t + l) < q(x;t), (3.25) 

where denotes the set of global minimum states of Fx. Consequently, for all t > tx, 
we have 

\\q{t+l)-q(t)\\= E {q(x-t + l)-q(x;t)}- ]T t + 1) - q(x; t)} 

= 2 X! {5(a;;t+l)-g(x;t)}, (3.26) 
where we used = J2 x es™ in l( x 'j + X^s™ ^(^j — 1- We then obtain 

CO 

J2h(t + l)-q{t)\\ = 2 {q(x;oo)-q(x;t 1 )}<2. (3.27) 

t=ti ze5™ in 

Therefore q(t) satisfies the condition (iii): 

CO t\— 1 oo 

E + 1) - «(*)ll = E + 1) - + E + 1) - 

i=0 t=0 t=ti 

<2ti + 2<oo, (3.28) 
which completes the proof of strong ergodicity. 
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3.3. Remarks 

Remark 1. In the above analyses we treated systems with discrete degrees of freedom. 
Theorem El does not apply directly to a continuous system. Nevertheless, by 
discretization of the continuous space we obtain the following result. 

Let us consider a system of N distinguishable particles in a continuous space of 
finite volume with the Hamiltonian 

ff = ^)|>. 2 + ^M). (3-29) 

The mass m(t) controls the magnitude of quantum fluctuations. The goal is to find 
the minimum of the potential term, which is achieved by a gradual increase of m(t) to 
infinity according to the prescription of QA. After discretization of the continuous space 
(which is necessary anyway in any computer simulations with finite precision) and an 
application of the Suzuki- Trotter formula, the equilibrium partition function acquires 
the following expression in the representation to diagonalize spatial coordinates 



l '■•+!) _ ,.(*) 2 



(3.30) 



^)^ex P (--£v({r.< })-_gg| r| 

where we choose the unit % = 1. Theorem |3] is applicable to this system under the 
identification of Ti(t) with m(t)~ l . We therefore conclude that a logarithmic increase of 
the mass suffices to guarantee strong ergodicity of the potential-minimization problem 
under spatial discretization. 

The coefficient corresponding to the numerator of the right-hand side of ()3.13|) is 
estimated as 

RLi » M 2 NL 2 /(3, (3.31) 



where L denotes the maximum value of 



r (*+i) _ r W 



To obtain this coefficient, let us 



consider two extremes. One is that any states are reachable at one step. By definition, 

R = 1 and L x » M 2 NL 2 /(3, which yield fEOTJl . The other case is that only one particle 

can move to the nearest neighbour point at one time step. With a (<^C L) denoting the 

lattice spacing, we have 

Men, x9l MLa 
^^—{^-{L-af}^— . (3.32) 

Since the number of steps to reach any configurations is estimated as R ~ NML/a, we 
again obtain ()3.31|) . 



Remark 2. In Theorem El the acceptance probability is defined by the conventional 
Boltzmann form, (|3.5|) and (J3.6|) . However, we have the freedom to choose any transition 
(acceptance) probability as long as it is useful to achieve our objective since our goal 
is not to find finite-temperature equilibrium states but to identify the optimal state. 
There have been attempts to accelerate the annealing schedule in SA by modifying 
the transition probability. In particular Nishimori and Inoue |2Z1 have proved weak 
ergodicity of the inhomogeneous Markov chain for classical simulated annealing using 
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the probability of Tsallis and Stariolo [2Ej- There the property of weak ergodicity was 
shown to hold under the annealing schedule of temperature inversely proportional to 
a power of time steps. This annealing rate is much faster than the log-inverse law of 
Geman and Geman for the conventional Boltzmann factor. 

A similar generalization is possible for QA-PIMC by using the following modified 
acceptance probability 

Mv> x i t )=9(u(v,x;t)), (3.33) 

u(y,x;t) = e -C*M-*bW)/*b j x + {q _ l) Fl ^~ ^ , (3.34) 

where q is a real number. In the limit q — > 1, this acceptance probability reduces to the 
Boltzmann form. Similarly to the discussions leading to Theorem El we can prove that 
the inhomogeneous Markov chain with this acceptance probability is weakly ergodic if 

T '»^' 0<c ^ < M5 > 

where b is a positive constant. We have to restrict ourselves to the case q > 1 for 
a technical reason as was the case previously [27|. We do not reproduce the proof 
here because it is quite straightforward to generalize the discussions for Theorem |3] in 
combination with the argument of j2Zj. The result (J3.35)) applied to the TFIM is that, 
if the annealing schedule asymptotically satisfies 

r(t) > -j exp I I , (3.36) 

the inhomogeneous Markov chain is weakly ergodic. Notice that this annealing schedule 
is faster than the power law of ()3.15|) . We have been unable to prove strong ergodicity 
because we could not identify the stationary distribution for a fixed T\ (t) in the present 
case. 



4. Quantum annealing with Green's function Monte Carlo method 

The path-integral Monte Carlo simulates only the equilibrium behaviour at finite 
temperature because its starting point is the equilibrium partition function. Moreover, it 
follows an artificial time evolution of Monte Carlo dynamics, not the natural Schrodinger 
dynamics. An alternative approach to improve these points is the Green's function 
Monte Carlo (GFMC) method [2U I2H EDI • The basic idea is to solve the imaginary- 
time Schrodinger equation by stochastic processes. The Schrodinger dynamics with 
imaginary time has an extra advantage that one can reach the optimal state more 
efficiently than by real-time dynamics jSI] . Thus, for our purpose to solve optimization 
problems, it is more important to discuss imaginary-time Schrodinger equation than the 
"natural" real-time evolution. 

In the present section we derive sufficient conditions for strong ergodicity to hold 
in GFMC. 
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4-1. Green's function Monte Carlo method 

The evolution of states by the imaginary-time Schrodinger equation starting from an 
initial state |-?/>o) is expressed as 

\m) = Texp (- Jl^'H{t')) lV>o>, (4.1) 

where T is the time-ordering operator. The right-hand side can be decomposed into a 
product of small-time evolutions, 

|VK0> = Jim Go(tn-i)G (t„- 2 ) • ■■G (t 1 )G (t )\^ } } (4.2) 

where t k = kAt, At = t/n and Go(t) = 1 — At ■ H(t). In the GFMC, one approximates 
the right-hand side of this equation by a product with large but finite n and replaces 
Go(t) with Gi(t) = 1 — At(H(t) — Et), where Et is called the reference energy to 
be taken approximately close to the final ground-state energy. This subtraction of the 
reference energy simply adjusts the standard of energy and changes nothing physically. 
However, practically, this term is important to keep the matrix elements positive and 
to accelerate convergence to the ground state as will be explained shortly. 

To realize the process of (|4.2jl by a stochastic method, we rewrite this equation in 
a recursive form, 

^k+i(y) = J2 &(l/> x '> t k )ifj k (x), (4.3) 

X 

where if>k{ x ) — (^iV'fc) an d \x) denotes a basis state. The matrix element of Green's 
function is given by 

G 1 (y,x;t) = (y\l - At(H(t) - E T )\x). (4.4) 

Equation ()4.3)1 looks similar to a Markov process but is significantly different in 
several ways. An important difference is that the Green's function is not normalized, 
J2 y G\(y, x\ t) 7^ 1. In order to avoid this problem, one decomposes the Green's function 
into a normalized probability G\ and a weight w: 

Gi(y, x; t) = d{y, x; t)w(x; t), (4.5) 

where 

E y Gi{y,x;t) Gi{v,x;t) 
Thus, using (JUSJ), the wave function at time t is written as 

^n{y) = 5y,x n w{Xn-l] t n -x)w{x n - 2 \ tn-2) " • • w(x ] t ) 

t n _ 2 ) ■ ■ ■G 1 {x 1 ,x Q ;t Q )ip Q {x ). (4.7) 

The algorithm of GFMC is based on this formula and is defined by a weighted 
random walk in the following sense. One first prepares an arbitrary initial wave 
function ip (xo), all elements of which are non-negative. A random walker is generated, 
which sits initially (t = to) at the position xq with a probability proportional to 
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ipo(xo). Then the walker moves to a new position X\ following the transition probability 
G\{x\, x ; to). Thus this probability should be chosen non-negative by choosing 
parameters appropriately as described later. Simultaneously, the weight of this walker is 
updated by the rule W\ = w(x ; to) Wo with W = 1. This stochastic process is repeated 
to t — t n -i. One actually prepares M independent walkers and let those walkers follow 
the above process. Then, according to (|4.7|) . the wave function ip n (y) is approximated 
by the distribution of walkers at the final step weighted by W n , 

i M 

MV)= lim jTEA iWl (4.8) 

M— >oo M : 
1=1 

where i is the index of a walker. 

As noted above, Gi(y,x;t) should be non- negative, which is achieved by choosing 
sufficiently small At (i.e. sufficiently large n) and selecting Et within the instantaneous 
spectrum of the Hamiltonian H(t). In particular, when Et is close to the instantaneous 
ground-state energy of H(t) for large t (i.e. the final target energy), Gi(x,x;t) is close 
to unity whereas other matrix components of G\{t) are small. Thus, by choosing Et 
this way, one can accelerate convergence of GFMC to the optimal state in the last steps 
of the process. 

If we apply this general framework to the TFIM with the <r z -diagonal basis, the 
matrix elements of Green's function are immediately calculated as 

1 - At{E (x) - E T ) (x = y) 

Gi(y,x;t) = ^ AtT(t) (x and y differ by a single-spin flip) (4-9) 

(otherwise), 

where E (x) = (x\ (—J2ij Ji3°~t°~f) \ x )- O ne should choose At and Et such that 
1 — At(Eo(x) — Et) > for all x. Since w(x, t) = J2 y G±(y, x; t), the weight is given by 
w{x;t) = l-At{E (x) — E T ) + NAtT(t). (4.10) 

One can decompose this transition probability into the generation probability and the 
acceptance probability as in (|2.1jl : 

P(»,x) = (^ (Single " SPin (4.11) 
[ (otherwise) 

NAtV(t) 

Aly - *• t] = l-At ( E o(x) -E T ) + NAtT ( ty (412) 

We shall analyze the convergence properties of stochastic processes under these 
probabilities for TFIM. 



4-2. Convergence theorem for QA-GFMC 

Similarly to the QA by PIMC, it is necessary to reduce the strength of quantum 
fluctuations slowly enough in order to find the ground state in the GFMC. The following 
theorem provides a sufficient condition in this regard. 
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Theorem 4 (Strong ergodicity of QA-GFMC) The inhomogeneous Markov pro- 
cess of random walker for the QA-GFMC of TFIM, \2.1)) . fl^.i l\j and \4.12\) , is strongly 
ergodic if 

T(t) > , b : , 0<c<— . (4.13) 

The lower bound of the transition probability given in the following lemma will be 
used in the proof of Theorem 0] 

Lemma 2 The transition probability of random walk in the GFMC defined by \2.1)) . 
\4-ll ) and \4-12\ ) has the lower bound: 



P(y, ,) > => « > : Gl(v , x; t) > , _ ^j!^] + NAtr{t y (4-U) 

a,>0,Vt>t,=G,(,,,;«) > T -^-^ TWMWy (4.15) 

where E min is the minimum value of E (x) 

E min = mm{E (x)\xeS}. (4.16) 

Proof of Lemma [H The first part of Lemma |2 is trivial because the transition 
probability is an increasing function with respect to Eq(x) when P(y,x) > as seen 
in (I4.12|) . Next, we prove the second part of Lemma 121 According to (|4.9J1 and (|4.10|) . 

Gi(x,x;t) is written as 

G 1 (x, x;t) = l — , NAt ^ ArA w (4.17) 

v ' ' ; 1 — At(E (x) — E T ) + NAtT(t) v ; 

Since the transverse field T(t) decreases to zero with time, the second term on the right- 
hand side tends to zero as t — > 00. Thus, there exists t\ > such that Gi(x, x;t) > 1 — e 
for \/s > and Vt > t\. On the other hand, the right-hand side of ()4.15|) converges to 
zero as t —>■ 00. We therefore have (|4.15jl . 

Proof of Theorem^ We show that the condition 1)4.13)1 is sufficient to satisfy the three 
conditions of Theorem |21 

(i) From Lemma EJ we obtain a bound on the coefficient of ergodicity for sufficiently 
large k as 

< , r kN,kN-Ns ^ j AtT(kN-l) \ N 

l ~ a{Gl ^{l-AtiE^-E^ + NAtTikN-l)] ' (4 ' 18) 

in the same manner as we derived ()3.22|) . where we used R = N. Substituting the 
annealing schedule ()4.13|) . we can prove weak ergodicity from Theorem ^because 



00 



uN 

£ (l " *(G*™»-»)) > £ ^ (4.19) 



k=l k=ko 

which diverges when < c < 1/N. 
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ii) As shown in |Appendix C the stationary distribution of the instantaneous 



transition probability Gi(y,x;t) is 

a( x . t )= W{x;t) =- AtEo{x) (4 20) 

q[,) ~E x esw(x;t) 2 N 2 N {1 + At E T + NAtT(t)Y 1 UJ 

(iii) Since the transverse field T(t) decreases monotonically with t, the above 
stationary distribution q(x; t) is an increasing function of t if Eq(x) < and is decreasing 
if E > 0. Consequently, using the same procedure as in (j3.26|) . we have 

|| 5 (t + l)- 5 (t)||=2 ]T {q(x;t + l)-q(x;t)}, (4.21) 

E (x)<0 

and thus 

oo 

£||g(* + l)-<z(*)|| = 2 Yl {g(x;oo)-g(x;0)}<2. (4.22) 

t=0 E (x)<0 

Therefore the sum J2u=o + -0 ~~ <?WII i s finite, which completes the proof of the 
condition (iii). 

Remark. Theorem 0] asserts convergence of the distribution of random walkers to the 
equilibrium distribution ()4.20|) with T(t) — > 0. This implies that the final distribution 
is not delta-peaked at the ground state with minimum E (x) but is a relatively mild 
function of this energy. The optimality of the solution is achieved after one takes the 
weight factor w(x;t) into account: The repeated multiplication of weight factors as in 
()4.7|) . in conjunction with the relatively mild distribution coming from the product of 
G\ as mentioned above, leads to the asymptotically delta-peaked wave function ipn{y) 
because w(x; t) is larger for smaller -E'o(^) as seen in (|4.1U|) . 

4-3. Alternative choice of Green's function 

So far we have used the Green's function defined in ()4.4j) . which is linear in the transverse 
field, allowing single-spin flips only. It may be useful to consider another type of Green's 
function which accommodates multi-spin flips. Let us try the following form of Green's 
function, 

G 2 (t) = exp (AtT(t) ]T ofj exp Ut ]T J^a^j , (4.23) 

which is equal to Go(t) to the order At. The matrix element of G2 {t) in the cr z -diagonal 
basis is 

G 2 {y,x;t) = cosh 7V (Atr(t))tanh 5 (Atr(t))e- A * So(x) , (4.24) 

where 5 is the number of spins in different states in x and y. According to the scheme 
of GFMC, we decompose G2(y,x;t) into the normalized transition probability and the 
weight: 



cosh(AtT(t)) 

3 At r(t) 



N 



G 2 (y,x-t) = l ^^::^ JJ \ tanh 5 (Atr(i)), (4.25) 
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w 2 {x;t)=e AtNr ^e- AtEo(x \ (4.26) 

It is remarkable that the transition probability G 2 is independent of E (x). Thus, 
the stationary distribution of random walk is uniform. This property is lost if one 
interchanges the order of the two factors in ()4.23j) . 

The property of strong ergodicity can be shown to hold in this well: 

Theorem 5 (Strong ergodicity of QA-GFMC 2) The inhomogeneous Markov chain 
generated by is strongly ergodic if 

T(t) > log {l - 2b(t + l)- 1 ^} . (4.27) 

Remark. For sufficiently large t, the above annealing schedule is reduced to 

rw a *$TW - < 4 - 28) 

Since the proof is quite similar to the previous cases, we just outline the idea 
of the proof. The transition probability G 2 (y,x;t) becomes smallest when 5 = N. 
Consequently, the coefficient of ergodicity is estimated as 



1-«(G* +1 '')> 



I _ e -2Atr(t) } N 



2 J 

We note that R is equal to 1 in the present case because any states are reachable from 
an arbitrary state in a single step. From Theorem the condition 

fi_ e -2AtT(t)) N y 

{-^r-\ - tTi < 4 - 29 ' 

is sufficient for weak ergodicity. From this, one obtains (|4.27j) . Since the stationary 
distribution of G 2 (y, x; t) is uniform as mentioned above, strong ergodicity readily follows 
from Theorem El 

Similarly to the case of PIMC, we can discuss the convergence condition of QA- 
GFMC in systems with continuous degrees of freedom. The resulting sufficient condition 
is a logarithmic increase of the mass as will be shown now. The operator G 2 generated 
by the Hamiltonian (|3.29J) is written as 

^W=-p(-^g P ?)e— ««». (4.30, 
Thus, the Green's function is calculated in a discretized space as 

G 2 (y, x; t) oc exp (-^t £ \ r >. _ r tf _ ^tV{{r^ , (4.31) 

where x and y represent {ri} and respectively. Summation over y, i.e., integration 
over {r'j}, yields the weight w(x;t), from which the transition probability is obtained: 

w(x;t) oc e- Atvi{r * } \ (4.32) 
G 2 (y, x; t) oc exp £ \r[ - r 4 | j . (4.33) 
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The lower bound for the transition probability depends exponentially on the mass: 
G 2 (y, x;t) > e~ Cm ^ . Since 1 — a(G t 2 i ~ 1,t ) has the same lower bound, the sufficient 
condition for weak ergodicity is e~ Cm ^ > (t + l) -1 , which is rewritten as 



The constant C is proportional to NL 2 /At, where L denotes the maximum value of 
1 7*' — r\. The derivation of C is similar to (j3.31|) . because G^t) allows any transition to 
arbitrary states at one time step. 

5. Discussion 

We have proved strong ergodicity of the inhomogeneous Markov chains associated with 
QA-PIMC and QA-GFMC, mainly with the application to the TFIM in mind, which 
covers a wide range of combinatorial optimization problems. Our proof is quite general 
in the sense that it does not depend on the spatial dimensionality or the lattice structure 
of the system. The convergence of QA is guaranteed if the transverse field decreases as 
T(t) ~ const /t c asymptotically. This annealing schedule for the transverse field is faster 
than the temperature-annealing schedule, the log-inverse law, found by Geman and 
Geman for SA. Moreover, the generalized transition probability in PIMC accelerates the 
annealing schedule to T(t) « exp(— t c ) (although we could not prove strong ergodicity 
in this case). Since the constant c appearing in these formulas depends on the system 
size as 1/N and is therefore very small for large systems, our result may not provide 
practically useful guidelines to anneal the transverse field. This is the same situation 
as in SA, in which the temperature annealing should be N/ logt or slower to converge. 
Nevertheless our Theorems and Corollary represent quite non-trivial results because 
they assure eventual convergence of the system to the ground state (or a state near the 
ground state for PIMC) after non- stationary processes without being trapped in local 
minima. 

Let us write a few words on computational complexity. Although the annealing 
schedule of QA, the power-law dependence on t, is much faster than the log-inverse law 
for SA, this does not mean that QA provides an algorithm to solve NP problems in 
polynomial time. The time for T(t) to reach a sufficiently small value 5 is estimated 
from (j3.15|) as 



Since RL\ is of the order of N, the QA needs a time exponential in N to converge. 
An important point is that the coefficient of N in the exponent, (9(log5 _1 ), is much 
smaller than that for SA, in which the coefficient is 0(1/S) as can be seen from 
T(t) ~ N/ logt ~ 5. The situation is the same in QA-GFMC. If one uses the generalized 
transition probability, the corresponding time is 



which again shows exponential dependence on N with a much smaller coefficient. 



m(t) < C^logO+l). 



(4.34) 




(5.1) 




(5.2) 
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Appendix A. Proof of Lemma ^ 

The first part of Lemma Q is proved straightforwardly Equation (jH.lfiJ) follows directly 
from the definition of the transition probability and the property of the acceptance 
function g. When q(y; t)/q(x; t) < 1, we have 

On the other hand, if q(y;t)/q(x;t) > 1, 

G(y, x;t)>wg(l)>w g{\) exp (-^ - ^ J , (A.2) 

where we used the fact that both L and L\ are positive. 

Next, we prove (j3.17J) . Since x is not a member of S m , there exists a state y G S x 
such that Fi(y) — Fi(x) > 0. For such a state y, 

because Ti(t) tends to zero as t — > oo and < g{u) < u. Thus, for all e > 0, there 
exists ti > such that 

Vi > tl : g ( CTP (.m^JM _ ^iMgiM)) < , (A.4) 

We therefore have 

J2P(z,x)A(z,x;t) = P(y,x)A(y,x;t)+ J2 P(z,x)A(z,x;t) 
zes z£S\{y} 

<P(y,x)e+ P(z,x) 

zeS\{y} 

= l-(l-e)P(y,x), (A.5) 

and consequently, 

G{x, x; t) > (1 - e)P(y, x) > 0. (A.6) 

Since the right-hand side of ()3.17|) can be arbitrarily small for sufficiently large t, we 
obtain the second part of Lemma 
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We use the following notations: 

F (x) 



A{x) = exp 



B 



E M*) 



A{x) = -F x min . 

If x G iSf 1111 , the Boltzmann distribution can be rewritten as 

A(x) 



q(x;t) 



B + ^ exp 



A(i/) ' 
Ii(t). 



(B.l) 
(B.2) 

(B.3) 



My) 



Since A(?/) > by definition, the denominator decreases with time. Thus, we obtain 
To prove ([3.25)1 . we consider the derivative of q(x;t) with respect to Ti(t), 



dq(x; t) 



A(x)\bA(x)+ E (F 1 (x)-F 1 (y))e^(-P^-) 
\ y es\s™» ^ ^ >' 



My) 



T(t) 2 exp 



'A(xY 
Mt), 



^(y) 



(B.4) 



Only -Fi(x) — Fi(y) in the numerator has the possibility of being negative. However, the 
first term BA(x) in the curly brackets is larger than the second one for sufficient large 
t because exp (— A(y)/Ti(tj) tend to zero as — > oo. Thus there exists £i > such 
that dq(x; t)/dT{t) > for all t > t\. Since Ti(£) is a decreasing function of £, we have 
Of 



Appendix C. Proof of (jOUj) 



The transition probability defined by (|2.1|) . (|4.11|l and (|4.12jl is rewritten in terms of 
the weight (l4~Tn|) as 

NAtT(t) 



Gx(y,x;t) 



w(x; t) 
AtT(t) 
w(x] t) 




(x = y) 
(x G S y ] single-spin flip) 



(C.l) 



Thus, we have 



^2G 1 (y,x;t)q(x;t) = 1 - , 



(otherwise). 



NAtT(t)\ w(y;t) 



AtT(t) w{x;t) 



A 



x ^w(x;t) A 



NAtT(t) Atr(t) ^ 1 



.4 



.4 



(C.2) 



XGSy 
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where A denotes the normalization factor, 

w(x; t) = Tr J 1 - At f - £ J ij( j*a* -E T ]+ NAt F(t) 



= 2 N {l + AtE T + NAtT(t)} . (C.3) 

Since the volume of S y is N, (jC.2j) indicates that q(x; t) is the stationary distribution 
of Gi(y,x;t). The right-hand side of (|4.20J) is easily derived from the above equation. 
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